Pooling Hybrid Representations for Web Structured Data Annotation
نویسندگان
چکیده
Automatically identifying data types of web structured data is a key step in the process of web data integration. Web structured data is usually associated with entities or objects in a particular domain. In this paper, we aim to map attributes of an entity in a given domain to pre-specified classes of attributes in the same domain based on their values. To perform this task, we propose a hybrid deep learning network that relies on the format of the attributes’ values. It does so without any pre-processing or using predefined hand-crafted features. The hybrid network combines sequence-based neural networks, namely convolutional neural networks (CNN) and recurrent neural networks (RNN), to learn the sequence structure of attributes’ values. The CNN captures short-distance dependencies in these sequences through a sliding window approach, and the RNN captures long-distance dependencies by storing information of previous characters. These networks create different vector representations of the input sequence which are combined using a pooling layer. This layer applies a specific operation on these vectors in order to capture their most useful patterns for the task. Finally, on top of the pooling layer, a softmax function predicts the label of a given attribute value. We evaluate our strategy in four different web domains. The results show that the pooling network outperforms previous approaches, which use some kind of input pre-processing, in all domains.
منابع مشابه
HAWK - Hybrid Question Answering Using Linked Data
The decentral architecture behind the Web has led to pieces of information being distributed across data sources with varying structure. Hence, answering complex questions often required combining information from structured and unstructured data sources. We present HAWK, a novel entity search approach for Hybrid Question Answering based on combining Linked Data and textual data. The approach u...
متن کاملImproving Web Search Ranking by Incorporating Structured Annotation of Queries
Web users are increasingly looking for structured data, such as lyrics, job, or recipes, using unstructured queries on the web. However, retrieving relevant results from such data is a challenging problem due to the unstructured language of the web queries. In this paper, we propose a method to improve web search ranking by detecting Structured Annotation of queries based on top search results....
متن کاملKnowledge Extraction for Hybrid Question Answering
Since the proposal of hypertext by Tim-Berners Lee to his employer CERN on March 12, 19891 the World Wide Web has grown to more than one billion Web pages and still grows.2 With the later proposed Semantic Web vision [1], Lee et al. suggested an extension of the existing (Document) Web to allow better reuse, sharing and understanding of data. Both the Document Web and the Web of Data (which is ...
متن کاملAnnotation-Based Automatic Action Processing
With a strong motivational background in search engine optimization the amount of structured data on the web is growing rapidly. The main search engine providers are promising great increase in visibility through annotation of the web page’s content with the vocabulary of schema.org and thus providing it as structured data. But besides the usage by search engines the data can be used in various...
متن کاملAnnotation as Process, Thing, and Knowledge: Multi-domain studies of structured data annotation
Following Buckland’s (1991) work on the nature of information, this paper characterizes the multi-faceted concept of ‘annotation’ as process, thing, and knowledge. This typology is then used to enumerate general research questions for the exploration of annotation in arbitrary domains. Our research team’s investigation of annotation of structured data in specific domains and user groups is desc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1610.00493 شماره
صفحات -
تاریخ انتشار 2016